Object Detection

TensorFlow Object Detection

by CM

Posted on April 07, 2020

The Goal:

In this article, we will explore the TensorFlow Object Detection API. In particular, we will use TensorFlow 2 and OpenCV to do live inference of a video stream. Therefore, we will make use of different models to see what works best for us. To follow the tutorial, please make sure to have installed the following dependencies: TensorFlow 2.x, OpenCV, and the TF Object Detection API. In case you have not used / installed the dependencies before -- there is a nice tutorial by sentdex how to install both TF and OpenCV.

Object Detection:

Having an image or a video stream, an object detection model should be able to identify a set of objects as well as their position within an image. In other words, the object detection model that we will build in the article will be trained to detect the presence and location of multiple classes of objects. The idea is that our model will output a list of the objects it detects, the location of a bounding box that contains each object, and a confidence score.

Key components are:

Lets first, check our TensorFlowFlow Version. We plan to build a Object Detection Model with TensorFlow 2.x. Remember the original object detection API by Google was designed for TF1.x and is incompatible with TF2.x. In our case, we are working with version '2.1.0'

import tensorflow as tf
tf.__version__

Let's jump right into the Code. First, we import all required dependencies. (1) pathlib offers classes representing filesystem paths with semantics appropriate for different operating systems. (2) importlib has two purposes. One is to provide the implementation of the import statement (and thus, by extension, the __import__() function) in Python source code. Two, the components to implement import are exposed in this package, making it easier for users to create their own custom objects (known generically as an importer) to participate in the import process. (3) numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. (4) OpenCV is a library of programming functions mainly aimed at real-time computer vision. (5) TF object detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection.

import pathlib
import importlib
import numpy as np

#OpenCV
import cv2

#TensorFlow Object Detection API
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

Second, as we are using TensorFlow 2.x, we need to patch / rename two TensorFlow files of the classical object detection API.

# Rename tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Rename tf.io.gfile into tf.gfile
tf.gfile = tf.io.gfil

We then need to define three functions: (1) Loading the model (2) Running inference on picture material (3) Activation of the Video Capture via Webcam.

We will start of with building the loading model function. We therefore will make use of the pretrained model provided by the TensorFlow API. In addition, we will specify the path to our prediction labels that we will later use to map the different classes in the image / video stream.

def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name,
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

The second function will be the inference function on a single image. Therefore, we will provide two arguments: The Model that we will run inference on as well as the input file. As we are working with TensorFlow, we will need to covert the images to input tensors. Our idea is to provide the image a batch of images to run inference on. Further, we will need to define our output dictionary. Remember all outputs are batches tensors. Hence, we need to convert them to numpy arrays, and take index [0] to remove the batch dimension. Note that detection_classes should be ints. Now it is time to handle the model masks in the frame of the image. Therefore, we will check frame a detection mask in the respective shape around the image.

def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis, ...]

    # Run inference
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key: value[0, :num_detections].numpy()
                   for key, value in output_dict.items()}
    output_dict['num_detections'] = num_detections

    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)

    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            output_dict['detection_masks'], output_dict['detection_boxes'],
            image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
                                           tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
    return output_dict

Now we will define our run_infernece function with using the webcam stream as an input.

def run_inference(model):
    # activate video capture option

    #cv2 = getpack("opencv-python", "cv2")
    cap = cv2.VideoCapture(0)

    while True:
        ret, image_np = cap.read()
        # Actual detection.
        output_dict = run_inference_for_single_image(model, image_np)
        #print(output_dict)
        # Visualization of the results of a detection.
        vis_util.visualize_boxes_and_labels_on_image_array(
            image_np,
            output_dict['detection_boxes'],
            output_dict['detection_classes'],
            output_dict['detection_scores'],
            category_index,
            instance_masks=output_dict.get('detection_masks_reframed', None),
            use_normalized_coordinates=True,
            line_thickness=4)


        cv2.imshow('Object detection', cv2.resize(image_np,(1280,720)))
        if cv2.waitKey(25) & 0xFF ==ord('q'):
            cv2.destroyAllWindows()
            break

Lastly, we will write our third function -- a model choice function.

def main():
    while True:
        choice=int(input("Do you want to run a model without instance segmentation (1) or with instance segmentation (2)? If you want to choose another model press (3).\nBe aware that instance segmentation is computationally intensive.\nTo exit the program choose (0): "))
        if choice == 1:
            print("To end this live object detection press (q).")
            #Without instance segmentation
            model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
            detection_model = load_model(model_name)
            run_inference(detection_model)

        elif choice == 2:
            print("To end this live object detection press (q).")
            #With instance segmentation
            model_name = "mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28"
            masking_model = load_model("mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28")
            masking_model.output_shapes
            run_inference(masking_model)

        elif choice == 3:
            model_name=input("Please copy paste the name of your selected COCO-trained model (see here https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md): ")
            detection_model = load_model(model_name)
            run_inference(detection_model)

        else:
            print("Thank you.\nGoodbye.")
            break

Give it a second to open up the Webcam window. After the window opens up -- you should be able to do live inference. The Object Detection API works brilliantly. Try to think what you have just done with a few lines of Python code -- this is amazing!

In this simple tutorial, we have used TensorFlow Object Detection API to do live inference with a Webcam.

Leverage OpenCV

#EpicML

Projects / Apps

News

Dec 2021

--- Quantum ---

Simulating matter on the quantum scale with AI #Deepmind

Nov 2021

--- Graviton3 ---

Amazon announced its Graviton3 processors for AI inferencing - the next generation of its custom ARM-based chip for AI inferencing applications. #Graviton3

May 2021

--- Vertex AI & TPU Gen4. ---

Google announced its fourth generation of tensor processing units (TPUs) for AI and ML workloads and the Vertex AI managed platform #VertexAI #TPU

Feb 2021

--- TensorFlow 3D ---

In February of 2021, Google released TensorFlow 3D to help enterprises develop and train models capable of understanding 3D scenes #TensorFlow3D

Nov 2020

--- AlphaFold ---

In November of 2020, AlphaFold 2 was recognised as a solution to the protein folding problem at CASP14 #protein_folding

Oct 2019

--- Google Quantum ---

A research effort from Google AI that aims to build quantum processors and develop novel quantum algorithms to dramatically accelerate computational tasks for machine learning. #quantum_supremacy

Oct 2016

--- AlphaGo ---

Mastering the game of Go with Deep Neural Networks. #neural_network